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SIMPLE  3-STEP  CENSORED  QUANTILE  REGRESSION 
AND  EXTRAMARITAL  AFFAIRS 

VICTOR  CHERNOZHUKOV  AND  HAN  HONG 

Abstract.  This  paper  suggests  simple  3  and  4-step  estimators  for  censored  quantile  regres- 
sion models  with  an  envelope  or  a  separation  restriction  on  the  censoring  probability.  The 
estimators  aire  theoretically  attractive  (asymptotically  as  efficient  as  the  celebrated  Powell's 
censored  least  absolute  deviation  estimator).  At  the  same  time,  they  are  conceptually  simple 
and  have  trivial  computational  expenses.  TheyLaxe, especially  useful  in  samples  of  small  size 
or  models  with  many  regressors,  with  desirable  finite  sample  properties  and  small  bias.  The 
envelope  restriction  costs  a  small  reduction  of  generality  relative  to  the  canonical  censored 
regression  quantile  model,  yet  its  main  plausible  features  remain  intact.  The  estimator  can 
also  be  used  to  estimate  a  large  class  of  traditional  models,  including  normal  Amemiya-Tobin 
model  and  many  accelerated  failure  and  proportional  hazard  models.  The  main  empirical 
example  involves  a  very  large  data-set  on  extramarital  affairs,  with  high  68%  censoring.  We 
estimate  45%  —  90%  conditional  quantiles.  Effects  of  covajiates  are  not  representable  as 
location-shifts.  Less  religious  women,  with  fewer  children,  and  higher  status,  tend  to  engage 
into  the  matters  relatively  more  than  their  opposites,  especially  at  the  extremes.  Marriage 
longevity  effect  is  positive  at  moderately  high  quantiles  and  negative  at  high  quantiles.  Ed- 
ucation and  marriage  happiness  effects  aire  negative,  especially  at  the  extremes.  We  also 
briefly  consider  the  survival  quantile  regression  on  the  Stanford  heart  transplant  data.  We 
estimate  the  age  and  prior  surgery  effects  across  survival  quantiles. 


L  Introduction 

In  statistics,  biostatistics,  and  econometrics,  there  has  been  a  great  deal  of  attention  given 
to  censored  data.  This  paper  analyzes  censored  quantile  regression  models  with  known  cen- 
soring points,  suggesting  a  very  simple  3-step  estimation  procedure  with  congenial  features. 
This  simplicity  is  achieved  through  the  structured  envelope  and  separation  restrictions  on  the 
censoring  probability,  yielding  an  easily  implementable  and  well-behaved  technique.  These 
restrictions  preserve  the  plausible  semi-parametric,  distribution-free  and  heteroscedastic  fea- 
tures of  the  model.  We  illustrate  the  procedure  in  two  examples,  an  extramarital  affairs 
example  and  the  Stanford  heart  transplant  data  with  complete  censoring  times. 

The  paper  is  organized  as  follows.  We  first  evaluate  the  censored  quantile  regression  model, 
first  proposed  by  Powell  (1986),  as  a  generalization  of  the  Lehman-Doksum  p-sample  quantile 
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treatment  problem  for  the  case  of  censored  data  and  general  treatment.  CQR  models  are  semi- 
parametric,  with  distribution-free  character,  and  equivariance  to  monotone  transformations. 
They  embody  rich  information  about  shifts  in  location,  scale,  and  other  moments  induced  by 
covariates,  all  summarized  as  the  quantile  treatment  effects.  Ensembles  of  quantile  regression 
curves  provide  a  more  complete,  often  more  interesting  picture  of  relationships  in  the  data 
than  conventional  mean  or  median  regressions  alone.  Graphically  a  spectacular  technique, 
they  rouse  curiosity,  and  engage  attention.  CQR  models  compare  favorably  to  the  well-known 
Amemiya-Tobin,  Cox,  Buckley-James,  and  other  approaches,  as  they  permit  distribution- 
free  specifications  as  well  as  rich  forms  of  heteroscedasticity,  including  scale  and  non-scale 
forms.  We  briefly  discuss  available  estimators  for  the  cases  of  fixed  censoring.  This  discussion 
motivates  the  k-step  estimators  on  congeniality  grounds,  including  implementation  ease,  good 
performance  in  small  samples  and  models  of  many  continuous  or  discrete  regressors,  and  cases 
of  very  high  censoring.  These  qualities  especially  pertain  the  empirical  examples,  which 
follow  after  the  discussion  of  theoretical  properties  and  simulations.  The  k-step  CQR  offers 
a  constructive,  robust,  and  well-behaved  method  to  estimate  the  CQR  models  as  well  as  the 
traditional  Amemiya-Tobin,  accelerated  failure,  and  many  Cox  models. 

2.  Censored  Regression  Quantile  Models 

2.1.  Censored  Quantile  Regression  and  Quantile  Treatment  Effects.  Although  the 
need  for  conditional  quantile  models  had  been  recognized  early  in  the  19th  century,  only 
in  the  past  century  did  an  extensive  research  of  the  subject  begin.  Lehmann  (1974)  and 
Doksum  (1974)  formulated  the  theory  of  quantile-quantile  plots,  posing  a  p-sample  quan- 
tile treatment  problem  and  arguing  that  location-shift  models  are  insufficient  to  summarize 
ubiquitous  quantile  shift  effects.  Koenker  and  Bassett  (1978)  introduced  quantile  regression 
(QR)  estimators  that  have  evolved  into  a  popular  approach  to  data  analysis.  Another  early 
work  of  high  merit  by  Hogg  (1975)  suggested  instrumental  variable  estimators  and  gave  a 
first  empirical  illustration  of  the  conditional  quantile  model's  breadth.  A  number  of  sem- 
inal works  also  lay  the  foundation  of  the  initial  developments,  including  Amemiya  (1981), 
Chaudhuri  (1991),Chaudhuri,  Doksum,  and  Samarov  (1997),  Powell  (1986),  Koenker  and 
Portnoy  (1987),  Portnoy  (1991),  Jureckova  and  Prochazka  (1994),  Newey  and  Powell  (1990), 
Buchinsky  and  Hahn  (1998)  Koenker  and  Machado  (1999),  Khan  and  Powell  (2001),  as  well 
as  many  others.  QR  is  central  to  a  substantial  number  of  empirical  studies,  as  Koenker  and 
Hallock  (2001)  recently  reviewed.' 

Our  target  is  the  conditional  quantile  function  of  the  dependent  real  variable  Y  given 
covariates  X  in  R*^,  Qy\x-  Qy\x  is  the  inverse  of  the  conditional  distribution  function  Fy\x- 

QY\x{r)  =  mUv  :  Fyixiv)  >  r}; 

therefore  Qy\x  is  a  complete  description  of  the  stochastic  relation  of  F  to  X. 

The  linear  model  of  Qy\x  is  of  fundamental  importance.    It  is  convenient,  conceptually 
appealing  and  simple,  incorporating  classical  linear  model  and  linear  location-scale  models 


'Koenker  and  Gelling  (2001)'s  introduction  to  quantile  regression  is  very  worthy. 
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as  important  special  cases, 

Qy^^{r)=  X'Pir).  (1) 

We  assume  that  X  includes  a  constant  and  note  that  it  may  incorporate  a  wide  array  of 
polynomial  and  alternative  transformations  of  the  observable  covariates.  This  model  could 
be  specified  for  a  particular  quantile,  say  median,  or  for  an  ensemble  of  quantile  curves, 
providing  a  more  complete,  global  description  of  conditional  distribution.  Both  local,  or 
single  quantile  restriction  models  and  global  models  each  have  their  own  virtues,  which  we 
shall  stress  en-route.  In  sequel  we  assume  that  (1)  applies  to  the  quantiles  of  interest.  Linear 
QR  model  (1)  is  a  natural  generalization  of  the  classical  location-shift  model: 

Y  =  X'a  +  u,  ■  (2) 

where  u  is  independent  of  X ,  with  distribution  function  F,  and  of  the  linear  location-scale 
shift  model 

Y  =  X'a  +  XV,  (3) 

where  X'j  >  0.  Indeed,  in  the  first  case  /3(t)  =  a  +  F~^{u)ei,  where  ei  =  (1,0, ...)',  and  in 
the  second  /5(t)  =  a  +  ^F~^{u).  Thus,  (2)  implies  that  all  the  slope  coefficients  are  the  same 
for  all  T,  whence  (3)  implies  that  /3(t)  are  monotone  in  the  quantile  index  r.  The  general 
QR  model  (1),  local  or  global,  does  not  make  such  restrictions.  Thus,  QR  models  ascribe  a 
rich  role  to  covariates  X,  allowing  them  to  exhort  location,  scale,  kurtosis,  and  quantile-shift 
effects.  As  is  the  case  with  conditional  mean  models,  the  QR  models  are  distribution  free. 

Another  principal  view  of  QR  is  as  a  way  to  extend  the  Lehmann-Doksum  p-sample  quantile 
treatment  problem  to  the  regression  setting(Koenker  and  Gelling  (2001)).  In  the  two-sample 
setting,  Lehmann  (1974)  and  Doksum  (1974)  arrived  at  the  following  formulation.  Suppose 
where  the  value  of  the  untreated  is  y,  the  treatment  adds  amount  A(y).  If  y  has  the  untreated 
distribution  F,  then  the  treated  distribution,  G,  is  defined  by  the  random  variable  Y  +  A{Y). 
Thus,  Lehmann  defined  the  treatment  effect  A(y)  as  the  horizontal  distance  between  G  and 
F  in  (y,p)  coordinates  F(y)  =  G{y  +  A(y))  or 

A{y)  =  G-\F{y))-y. 

Evaluating  A(y)  at  a  quantile,  y  =  F^'(t),  one  obtains  the  quantile  treatment  effect: 

6{t)  =  G-\t)  -  F-\t); 

In  fact,  5{t)  is  the  familiar  distance  from  the  45  degree  line  in  the  quantile-quantile  plot  of 
F~^{t)  vs.  G~^{t).  For  example,  the  quantile  treatment  effect  could  take  a  simple  location- 
shift  form  6{t)  =  Sq,  for  all  r,  or  a  scale-shift  form  (5o(t)  =  So-\-SiF~^{t).  More  generally,  the 
treatment  can  affect  such  features  as  skewness,  kurtosis,  and  other  moments,  all  summarized 
by  the  quantile  shift  effect  S{-).  This  2-sample  formulation  leads  to  the  linear  model  (1), 
noting  that  Qy\d,{t)  =  ^"U''")  +  ^(■^)^i5  where  Di  is  the  treatment  indicator. 

Thus,  firstly,  we  may  view  linear  QR  models  X'P(t)  as  a  generalization  of  the  p-sample 
problem  to  the  case  of  continuous  or  polychotomous  treatment,  represented  by  covariates 
-^(j)i  J  >  2,  as  in  dose-response  studies.  In  this  case,  P{j){t)  can  be  interpreted  as  a  partial 
derivative  with  respect  to  change  in  the  treatment,  or,  equivalently,  a  quantile  treatment 
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effect  of  changing  the  treatment  X(_,)  =  xq  to  xq  +  1-  Secondly,  introduction  of  covariates  X 
is  a  pertinent  way  to  control  for  observed  heterogeneity.  Regardless  of  whether  covariates  X 
bear  causal  or  control  meanings,  we  refer  to  /0(t)  as  the  quantile  treatment  or  the  quantile 
shift  effects. 

Equivariance  to  monotone  transformations  is  a  useful  property  of  quantile  regression  mod- 
els, as  emphasized  in  Powell  (1991),  Chaudhuri,  Doksum,  and  Samarov  (1997),  and  Koenker 
and  Gelling  (2001)  in  connection  to  censoring,  survival,  and  transformation  models.  We  shall 
state  a  slightly  more  general  form.  For  a  given  measurable  transformation  Tz{Y)  of  variable 

Y  and  other  variables  Z,  it  is  obvious  that 

QT-AY)\x.zir)  =Tz{Qy\x.z{t)),  since 

(4) 
P[Y  <QnxAr)\X,Z]  =  P[Tz{Y)  <UQy^^Ar))\X,Z]. 

For  example,  if  we  estimate  a  linear  model  X'P{t)  for  the  logarithm  of  survival  time  T, 

Y  =  log(r),  as  in  accelerated  failure  time  models,  then  Qt{t\X)  equals  exp{X'/3{T)).  This 
property  helps  interpret  and  communicate  data-analytic  findings;  and  it  is  not  shared  by  the 
conditional  mean  models.  Koenker  and  Gelling  (2001)  contains  a  lucid  illustration  in  the 
case  of  quantile  regression  survival  analysis. 

Transformation  equivariance  naturally  leads  to  models  of  censored  data.  Assume  that  the 
latent  variable  Y*  is  left-censored^  by  the  observable,  possibly  random,  censoring  points  Ci, 
and  we  collect 

Y^=Y*yC^,     X„      5^  =  1{Y^Q).  (5) 

Assume  Y*  is  conditionally  independent  of  the  censoring  point  d,  that  is,  for  all  y  G  M: 

P  (y*  <  y\X,,  C)  =P{Y*  <  y\X^)  ,  so  that 

Qy'Ix.c^^X'(3{t).  ^^^ 

The  conditional  independence  assumption  is  more  realistic  than  the  frequently  made  as- 
sumption of  independence  between  (YJ,  X^)  and  the  censoring  variables  Ct.  (In  the  Stanford 
example,  for  example,  there  is  a  significant  correlation  between  the  "ace"  and  "surgery" 
variables  with  the  censoring  times).  The  assumption,  that  censoring  points  are  known  for 
all  i,  is  realistic  in  many  (but  clearly  not  all)  situations.  For  example,  in  the  analysis  of 
post-transplant  survival  times  in  the  Stanford  data-set,  we  can  compute  all  censoring  points 
because  we  know  the  transplant  and  the  last  follow  up  dates  for  each  i.  In  other  situations, 
it  is  possible  to  impute  them  (see  Powell  (1986)  for  discussion).^  In  the  extramarital  affairs 
example,  the  censoring  point  is,  naturally,  zero,  or  "fixed",  for  all  observations.  This  type 
of  censoring  is  very  common  in  social,  psychometric,  technometric,  and  econometric  stud- 
ies. Conditioning  on  Cj,  assumption  (6)  and  transformation  equivariance  yield  the  following 
censored  QR  model: 

QY\x,c,{r)  =  X'l3{T)VC,.  (7) 


right-censoring  is  handled  by  reversing  the  sign. 

Here  we  do  not  consider  the  cases  where  the  censoring  points  are  unobserved.  There  is  a  growing  hterature 
on  median  and  M-estimation  under  random  censoring  -  see  e.g.  Yang  (1997)  and  Zhou  (1992). 
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Early  formulations  of  (7)  go  back  to  Amemiya  (1973)  and  Tobin  (1958)  who  considered 
Gaussian  errors  and  fixed  censoring  point  at  0: 

Y^  =  X'P  +  u,  \^X'p  +  u>0&ndO,  \inot;      u~Af(0,a);  (8) 

This  model  states  that  (5y|x(l/2)  =  X'p  V  0  and  assumes  a  parametric  form  of  distribution 
of  error  Y^  -  X'/3  V  0. 

Historically,  Tobin  (1958)  first  formulated  (8)  and  Amemiya  (1973)  provided  and  justified 
consistent  estimators.  Analyzing  expenditure  on  durable  goods,  Tobin  accounted  for  the 
fact  that  expenditure  was  nonnegative  and  frequently  assumed  value  0.  In  many  examples, 
specifically  Tobin's  example,  such  censoring  formulation  is  recurrently  criticized,  since  the 
negative  values  are  sometimes  meaningless  or  given  imaginary  interpretations.  We  stress 
that  although  x'P  may  be  hard  to  interpret  as  a  mean  of  a  "latent  variable,"  the  conditional 
median  model  Qy|x(l/2)  =  X'pvO  for  the  observed  expenditure  is  an  excellent  one.  Indeed, 
if  conditional  on  X  probability  of  "censoring"  is  greater  than  1/2,  then  conditional  median  of 
expenditure  is  zero.  Otherwise,  the  conditional  median  is  (reasonably  approximated  as)  X'I3. 
In  his  path-breaking  work,  Powell  (1984)  was  first  to  stress  median  and  quantile  regression 
in  the  distribution-free,  fixed  censoring  world.  For  this  reason,  we  refer  to  this  model  as  to 
the  Powell  CQR  model,  and  to  its  normal  form  as  to  the  Amemiya- Tobin  model. 

Having  model  (7)  in  mind,  we  can  speak  of  the  quantile  treatment/shift  effects.  I3{t)  now 
summarizes  the  effect  of  a  treatment  on  both  the  censored  and  uncensored  latent  variable. 
When  the  latent  variable  has  real  meaning,  such  as  the  survival  time,  the  interpretation 
is  the  same  as  discussed  earlier,  except  that  censoring  may  prevent  identification  of  the 
treatment  effects  for  all  quantiles  of  interest.  When  the  latent  variable  is  imaginary,  as  in 
the  expenditure  or  extramarital  examples,  it  is  important  to  indicate  the  relevance  of /3(r). 
Define  the  censored  quantile  treatment/  shift  effect  as  a  partial  derivative 

Sj{t,X,c)    =       Qy|x+e,i,,c(-r)  -  Qy\xA'^)     a 

"i?  [{x  +  ejvypir)  V  c  -  x'/3(r)  V  c)]  /v 
=     l(x'/3(r)>c)/?(r) 

6j{t,x,  c)  characterizes  the  eff'ect  of  changes  in  components  of  x  keeping  everything  else  fixed 
(ej  is  a  zero  vector  with  1  in  the  j-th  position).  /3(t)  in  turn  characterizes  the  treatment 
effect  6{t,x,c). 

As  the  class  of  CQR  models  inherits  the  qualities  of  the  uncensored  QR  models,  such  as 
distribution- free  character,  talent  of  heteroscedasticity,  they  compare  favorably  to  the  normal 
Amemiya- Tobin  models,  distribution-free  homoskedastic  accelerated  failure  models,  and  Cox 
proportional  hazard  models.  For  example,  an  accelerated  failure  time  models  are  of  the  form 

\og{T)  =  a  +  XL^e  +  u,  (9) 

where  u  has  an  unknown  distribution  F,  X^-[  represents  covariates  without  the  intercept.  This 
is  a  location-shift  model  with  quantile  treatment  effect  given  by  9.  Many  useful  proportional 
hazard  Cox  models  are  special  cases  of  (9),  for  example,  one  with  Weibull  baseline  hazard 
model.  More  generally.  Cox  models  can  be  written  as  In  Ao(T')  =  a  +  X'_i6  +  au,  for  unknown 
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integrated  baseline  hazard  function  Ao  and  a  Gumbel  variate  u.  Vector  6  summarizes  the 
relevant  quantile  treatment  effects,  so  that  the  Cox  model  is  a  location-shift  one  up  to  a 
transformation.  Prom  a  constructive  point  of  view,  one  or  the  other  model  may  be  preferred 
depending  on  the  context.  However,  when  heteroscedasticity  and  distribution  free  spirit  are 
desired,  the  CQR  approach  is  valuable. 

2.2.  Motivation  for  k-step  Estimators.  Sample  regression  quantiles  may  be  defined  in 
several  ways.  The  most  widely  used  method,  due  to  Koenker  and  Bassett(1978),  is  ingeniously 
simple.  Suppose  we  have  n  observations  {Yi^Xi}.  In  the  no-covariates  case,  the  sample  r-th 
quantile  /3{t)  is  generated  by  solving  the  problem  (Ferguson  (1967)): 


Ti^        T.PriY^-P), 


where  Pt{x)  =  {t  -  l{x  <  0))x.  Koenker  and  Bassett  (1978)  extended  this  concept  to  the 
regression  setting  as  follows: 

n 

g^,  T.pr{y^-^i(^)-  (10) 

They  also  and  Powell  (1984)  and  Portnoy  (1991)  developed  the  asymptotics.  The  asymp- 
totic distribution  parallels  that  of  ordinary  sample  quantiles.  Also,  as  Koenker  and  Bassett 
(1978)  elaborated,  the  sample  regression  quantiles  inherit  good  robustness  and  equivariance 
features  of  the  ordinary  sample  quantiles.  Robustness  is  a  strong  motivation  for  quantile  re- 
gression even  in  canonical,  seemingly  normal,  homoskedastic  cases,  as  heavy-tailed  admixture 
or  outliers  are  known  spoilers  of  many  classical  procedures. 

For  brevity  assume  for  now  Ci  =  0,  Vi.  In  the  censored  model  (7),  replacement  of  the  linear 
form  with  the  semi-linear  one  {X'/3  V  0), 

n 

leads  to  the  celebrated  Powell  (1984),  Powell  (1986)  estimator.  Powell  (1984)  established  the 
asymptotic  normality  of /3p(T)  and  developed  an  inference  theory,  paving  the  way  and  setting 
forth  a  standard  for  future  work. 

Despite  the  intuitive  appeal,  this  method  has  not  become  popular  in  empirical  research 
due  to  its  well  known  computational  difficulty.^  See,  for  instance,  Fitzenberger  (1997b), 
Fitzenberger  (1997a)  for  remarkable  research  as  well  as  extensive  simulations.  Buchinsky 
(1994)  and  Fitzenberger  (1997a)  designed  ingenious  computational  algorithms,  which  Fitzen- 
berger (1997a)  recommends  for  low  degrees  of  censoring,  while  admitting  that  "all  practical 
algorithms  perform  quite  poorly  when  a  lot  of  censoring  is  present"  ?  His  conclusion  is  well 


We  know  of  about  three  or  four  applications  of  censored  median  regression  in  econometrics.  In  contrast, 
Amemiya-Tobin  or  Cox  approaches  found  hundreds  of  applications,  if  not  thousands. 

For  example,  see  Fitzenberger  (1997a),  p. 15.  In  the  case  of  50%  censoring,  one  regressor,  and  small  sample 
n  =  100,  in  several  important  designs  (e.g.  A,  B),  the  frequency  of  computing  the  Powell  estimator  ranged 
from  5%  to  37%  for  various  algorithms.    For  some  of  the  other  designs  results  were  better  -  convergence 
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substantiated  by  a  very  extensive  Monte-Carlo  experiment  with  tens  of  different  practical  de- 
signs. All  experiments  involved  only  one  regressor  (!).  Increase  in  dimension  leads  to  further 
complications.  In  many  empirical  applications,  the  censoring  is  quite  heavy  and  dimensional- 
ity is  also  high.  For  example,  in  the  affairs  example  of  section  (4),  degree  of  censoring  is  68%; 
in  the  heart  dataset  of  section  (3.6.2)  -  37%.  The  number  of  regressors  in  these  two  datasets 
are  9  and  3,  respectively.  This  serves  as  one  strong  but  not  the  only  motivation  for  the  mat- 
ter herein.  Arguably,  an  important  goal  of  statistic  theory  is  the  design  of  both  theoretically 
elegant  and  also  implementable,  practically  attractive  estimators.  It  is  that  requirement  that 
makes  the  problem  at  hand  particularly  challenging. 

In  part  motivated  by  such  limitations,  recent  remarkable  work  by  Buchinsky  and  Hahn 
(1998)  and  Khan  and  Powell  (2001)  suggested  a  number  of  alternative  estimators.  Buchinsky 
and  Hahn  (1998)  proposed  to  first  estimate  the  propensity  score  h{Xi)  =  P  {8i  =  l\Xi)  by 
a  nonparametric  kernel  regression,  then  select  a  set  where  {i  :  h{Xi)  >  1  —  t}  of  the  whole 
sample,  i.e.  those  observations  i  where  the  conditional  quantile  line  is  above  zero,  X[P  (t)  >  0, 
and  then  use  a  transformed  QR  on  the  selected  sample.  Khan  and  Powell  (2001)  also  proposed 
to  use  any  of  the  three  methods  to  perform  the  first  stage  selection:  (i)  maximum  score 
estimators  of  the  regression  quantile,  (ii)  nonparametric  kernel  propensity  score  estimator 
h[Xi),  (iii)  nonparametric  locally  linear  conditional  quantile  estimator  of  Chaudhuri  (1991) 
(denoted  q{Xi)).  In  the  second  step,  they  obtain  the  estimator  by  running  a  weighted  QR: 
min^  Er=i  AzPr  {Y,  -  X[P)  where  e.g.  A,  =  K{h  (X,)  -  (1  -  r)  -  c)  or  A  {q  {X,)  -  c),  etc.  The 
two-stage  estimators  are  somewhat  less  efficient  than  the  Powell  estimator  due  to  smoothing 
and  trimming.  Ideologically,  these  estimators  share  the  ideas  behind  the  construction  of  the 
Powell  (1986)  estimator  (especially  the  estimating  equation  version  of  it),  except  that  Powell 
imposed  simultaneity  to  obtain  his  single  step  estimator. 

The  suggested  first  stages  are  extremely  attractive,  but  are  only  practical  in  low  dimensions, 
and  have  slow  convergence  rates.  Local  kernel  smoothers  apply  to  (sufficiently)  continuous 
variables  only,  whereas  a  lot  of  applications,  including  ours,  have  many  (sufficiently)  discrete 
covariates.  This  is  very  confounding.  Of  course,  an  asymptotic  theory  suggests  that  the 
^/n-  estimates  could  be  obtained  by  averaging  within  the  cells.  In  the  affairs  example,  there 
will  be  on  average  roughly  6800/(8^)  ■^  .2  observations  per  cell;  in  the  heart  example  - 
69/(15  X  2^)  ss  1.  Thus  such  asymptotics  is  void  here.  The  computational  burden  [especially 
when  the  bandwidths  are  carefully  chosen]  is  very  substantial  in  high  dimensions,  large  data- 
sets  for  all  of  the  first  stage  estimates.  As  a  result,  we  simply  cannot  use  any  of  the  available 
estimators  in  our  examples  and  many  other  real-life  applications  due  to  1)  heavy  censoring,  2) 
high  dimensionality,  3)  very  small  or  large  sample,  and  3)  polychotomous  nature  of  numerous 
regressors.  As  our  interest  concerns  many  quantiles;  these  settings  are  even  more  confounding. 
From  a  constructive  angle,  however,  we  strongly  stress  that  the  mentioned  estimators  can  be 
potentially  fruitful  in  a  lot  of  conceivable  cases. 

In  an  unpublished  report  that  precedes  this  one,  we  suggested  a  number  of  flexible  para- 
metric techniques  that  exploit  the  global  approximation  and  classification  ideas,  that  lead 


frequencies  ranged  from  30%  to  %70.    These  results  were  obtained  for  the  case  of  one  regressor.    In  case  of 
many  regressors  and  larger  n,  the  results  can  be  expected  to  be  worse. 
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to  consistent  and  asymptotically  normal  estimators  that  are  as  efficient  as  the  benchmark 
Powell  estimator.  The  present  paper  extracts  and  further  develops  what  we  think  is  an  es- 
sential, most  implementable  and  usable  part  of  our  previous  research.  The  current  approach 
is  based  on  the  structured  modeling  restrictions  that  we  put  on  the  censoring  probability. 
These  restrictions  cause  a  reduction  in  generality,  but  only  a  small  one,  since  they  allow  to 
incorporate,  for  example,  Amemiya-Tobin,  many  Cox  models,  and  accelerated  failure  time 
models  as  very  important  special  cases  while  at  the  same  time  allowing  for  heteroscedas- 
ticity  and  distribution-  free  character.  As  a  result,  an  easily  computable  (comparable  to 
linear  least  squares),  well-behaved,  robust  estimator  is  available.  Due  to  the  good  robustness 
properties,  it  offers  not  only  an  efficient,  practical  way  to  estimate  the  general  Powell  CQR 
models,  but  also  a  good  way  to  estimate  very  important  traditional  models.  Because  it  is 
easily  computable,  its  finite  sample  properties  can  be  studied  in  fine  details. 

3.  Simple  3-step  and  k-step  CQR  Estimators 

3.1.  The  procedure:  This  section  describes  the  steps  of  the  estimator  and  briefly  sketches 
the  basic  ideas  behind  them. 

3.1.1.    The  Steps. 

Step  1.  Estimate  a  parametric  classification  (probability)  modeler 

where  Si  is  the  indicator  of  not-censoring.  Xi  indicates  desired  transforms  of  Xi  and  Ci. 
Next,  select  the  sample  Jq  =  {i  :  p{X['^)  >  1  —  r-l-c},  where  c  is  strictly  between  0  and  r  and 
not  too  small  (practical  choice  is  discussed  in  the  appendix)  . 

Step  2.  Obtain  the  initial  (inefficient)  estimator  /3o  (t)  by  running  the  standard  QR: 

mmY,  Pr{Y^-X',p).  (11) 

ieJo 

Next  select  Ji  =  {i  :  X'^Po{t)  >  Cj  +  (5„}.  (5„  is  a  small  positive  number  going  to  zero  slowly 
as  n  — >  DO  (The  formal  condition  on  {5n}  is  that  y/n  x  ^„  — >  oo  and  5„  \  0.).  This  step 
therefore  consistently  selects  the  largest  subset  of  i  such  that  XIP{t)  >  Ci,  building  up  the 
efficiency  of  the  next  step. 

Step  3.  Run  QR  (11)  with  Ji  in  place  of  Jq. 

Denote  this  3-step  estimator  pi  (r). 

Step  4.  (Optional)  Further  repeat  step  3  once  or  finite  times,  using  sample  Jj  =  {i  : 
^■/3/-i(t)  >  a  +  <5„}  in  place  of  Jo.[/  =  2,3, ...]. 

In  step  4  each  repetition  involves  selecting  Jj  =  {i  :  X'-Pj-i{t)  >  Ci  +  5n},  and  then  obtaining 
Pi{t)  from  (11)  using  the  sample  .Jj.  Denote  the  k-step  estimators  as  (3j  (r). 


Made  for  brevity  of  proofs,  the  restriction  of  single  index  can  be  relaxed  by  allowing  the  binary  model  to 
have  plausible  stochastic  complexity  as  measured  by  the  uniform  covering  or  bracketing  entropy.  The  class  of 
the  single  index  is  particularly  suited  here  from  a  practical  point  of  view. 
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3.1.2.  What  is  in  the  steps?  Some  details  are  as  follows.  In  the  step  1  we  may  use,  for 
example,  logit,  probit,  extreme  value,  linear  (polynomial)  or  any  other  model  that  fits  the  data 
{5i,Xi}  well.  Xi  denotes  any  desired  transform  of  Xi.  For  example,  Xi  may  consist  oi  Xi,Ci 
and  its  squares  (Remark  4  allows  for  power  series  and  regression  spline  approximations).  In 
general,  this  gives  an  inconsistent  estimator  of  the  propensity  score 

but  the  inconsistency  is  not  important  as  long  as  the  misspecification  is  not  too  severe. 

Indeed,  the  goal  of  the  step  1  is  to  select  some  and  not  necessarily  the  largest  subset  of 
observations  where  h{Xi)  >  1  —  r,  i.e.  where  the  quantile  line  XI/3{t)  is  above  zero,  so  as  to 
obtain  an  initial  consistent  but  inefficient  estimator  Po{t).  This  task  can  be  carried  out  if, 
for  example,  p{Xl'yQ)  —  c  is  a  lower  bound  on  h{Xi): 

a.s.     p{Xl^o)-c<h{X^),  (12) 

(70  =  plim-y)  and  it  is  nontrivial,  meaning  that  the  selected  set  Jq  is  sufficiently  rich  -  matrix 
EXiX'-l{i  G  Jo)  is  asymptotically  invertible.  Greater  c  and  better  model  p{-)  simplify  the 
selection  task.  This  is  a  weak  requirement  since  p()  is  not  required  to  be  a  distribution 
function.  Thus  the  envelope  restriction  is  intuitive  and  not  very  restrictive,  but  it  may  be 
replaced  by  an  even  weaker  condition  -  the  separating  hyperplane  restriction  in  Theorem 
1  (cf.  Figure  1  for  motivation).  See  the  subsections  below  for  formal  details  and  further 
discussion  of  this  assumption.  Appendix  B  contains  details  for  practical  implementation. 

The  above  construction  assumes  that  the  estimator  7  is  reasonable  and  converges  to  a  value 
70  that  minimizes  a  sensible  distance  between  h{Xi)  and  the  model  p{Xlj).  For  example,  7 
may  be  defined  by  minimizing  Y17=\i^i  "  Pi^llWi  >"  which  case  under  standard  conditions 
7o  — >  7o  that  solves  min-y  E[h{Xi)  —p{X['-f)]^.  Alternatively,  quasi-ML  methods  can  be  used 
and  will  be  equivalent  to  weighted  least-squares.  In  our  empirical  section  we  employed  a 
polynomial  logistic  model  and  estimated  it  using  conditional  MLE. 

Another  attractive  choice  is  the  Fisher-Rao  discriminant  analysis.  The  discriminant  prospec- 
tive is  justifiable  as  follows  (treat  Ci  as  part  of  X  or  set  C,  =  0  for  brevity).  If  Xi|{(5i  =  1} 
has  density  or  p.m.f.  ^i(x)  and  Xj{(5j  =  0}  -  go{x),  then  by  the  Bayes's  rule: 

P{S^  =  1\X^  =  x)  =  — — — -, 

9191(2;)  +  qog2{x) 

where  qi  =  P{5i  =  1)  =  1  —  go-  Approximating  gi{x)  and  g-zix)  by  normality  with  different 
means  and  variances,  leads  to  the  classical  logistic  linear-quadratic  discriminant  analysis, 
LQDA  (see  e.g.  Amemiya  (1985),  p.  282).  Other  forms  of  gi  and  ^2  could  be  chosen  in  view 
of  their  concrete  problems,  but  even  normality  assumption  has  been  known  to  produce  good 
results  (cf.  Efron  (1975),  Press  and  Wilson  (1978),  Amemiya  and  Powell  (1983)).  Using  both 
simulated  and  real-  life  examples,  the  studies  found  a  surprisingly  good  performance  of  the 
LDA  classifier  even  when  the  regressors  were  binary!  Naturally,  when  the  normal  approxima- 
tion becomes  realistic,  discriminant  analysis  does  much  better  than  conditional  approaches 
(because  it  approximates  the  unconditional  MLE).  LQDA  was  among  the  top  3  classifiers  for 

In  the  terminology  of  modern  classification  analysis. 
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11  out  of  22  datasets  in  the  impressive  Statlog  project(see  Michie  and  Taylor(ed)  (1994))  and 
out-competed  many  sophisticated  classifiers.  The  Statlog  project  tested  23  algorithms  on  22 
large-scale,  commercially  important  problems  (Michie  and  Taylor(ed)  (1994)).  For  excellent 
further  refinements  and  generalizations  of  the  discriminant  approach  see  the  literature  cited 
in  the  next  subsection. 

Formally  we  shall  not  confine  ourselves  to  a  particular  estimator,  instead  an  assumption 
such  as  (12)  or  its  generalization  will  be  required  to  hold. 

Under  conditions  to  be  stated,  /3i  (r)  is  asymptotically  normal  with  variance  equal  to 
that  of  the  Powell  estimator.  Thus,  starting  with  a  "good"  subset  of  observations,  only  two 
recomputations  of  QR  suffice  to  obtain  the  Powell-efficient  estimator.  Estimators  j3i  (r)  are 
also  asymptotically  normal  with  variance  equal  to  that  of  the  Powell  estimator.  What  is  the 
asymptotic  rationale  for  considering  the  fourth  step?  For  /  >  2  estimator  /3/  (t)  has  the 
efficiency  structure  of  the  Powell's  one  step  estimator  in  the  sense  that  the  selector  in  step  3 
has  y/n  rate  of  convergence  and  Powell's  variance,  which  is  more  efficient  than  the  selector 
$q{t)  used  in  computing  I3\{t).  This  efficiency  structure  is  a  unique  property  of  the  Powell 
single-step  estimator  and  our  4-or-more  step  estimators. 

The  QR  iterations  on  step  4  are  somewhat  analogous  to  those  in  the  pioneering  and  re- 
markable ILPA  algorithm  that  Buchinsky  (1994)  designed  for  the  Powell  problem.^  However, 
going  beyond  the  third  or  fourth  step  is  not  desirable  on  statistical  grounds  (not  to  men- 
tion computational  reasons),  based  on  our  Monte- Carlo  experience.  Our  result  show  that 
given  the  first  classification  step  only  two  recomputations  of  quantile  regression  lead  to  an 
efficient  estimator  (relative  to  the  Powell  estimator).  (Regarding  computational  aspects,  the 
faster  interior  point  algorithms,  Koenker  and  Portnoy  (1987),  may  be  preferred  to  linear 
programming.) 

In  summary,  the  estimation  procedure  has  two  very  distinct  features:  a  very  simple,  para- 
metric classification  first  step,  and  the  additional  3rd  and  the  optional  further  step.  The 
estimators  are  as  efficient  as  the  Powell  estimator. 

3.2.  The  Model  Beneath:  Normative  and  Agnostic  Prospectives.  The  canonical 
CQR  model  in  (7),  together  with  the  "nontrivial  envelope"  restriction  (5),  can  be  thought  of 
as  a  model.  This  additional  assumption  is  a  critical  ingredient  to  yield  the  simplicity.  How 
restrictive  is  it? 

Set  Cj  =  0  to  simplify  notation.  A  popular  normal  model  assumes  that  conditional  on 
Xi^  Yi  is  conditionally  homoscedastic  normal.  Then  propensity  score  h[x)  is  4>{x'a),  for  the 
normal  c.d.f.  cf).  A  significantly  more  general  CQR  model  can  be  immediately  obtained  by 
simply  assuming  that  4>{x'^q)  —  c  is  a  nontrivial  envelope  of  an  unknown  propensity  score 
h{x)^  where  e.g.  x  =  (x,x'^, ...).  Such  assumption  imposes  neither  normality  nor  conditional 
homoscedasticity  nor  a  location-scale  sub-model.  Similarly,  if  the  benchmark  model  is,  say, 
the  Weibull  proportional  hazard  model,  as  in  the  section  about  the  heart  example,  then 


'^see  Buchinsky  (1994)  or  Fitzenberger  (1997b))  details.  The  basic  idea  is  to  start  at  a  value  /?(r),  say  0,  and 
then  proceed  with  iterative  linecir  programming  computations  until  convergence  is  reached.  The  convergence 
to  Powell's  estimator  is  not  guaranteed  and  can  be  quite  raure;  see  Fitzenberger  (1997b)  and  earlier  discussions. 
The  convergence  to  a  local  optimum  does  not  lead  to  a  consistent  estimator-cf.  Powell  (1984),  Powell  (1986). 
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Figure  1.  How  it  works.  The  solid  line  depicts  the  conditional 
quantile  function  and  the  propensity  score.  The  propensity  score 
equals  to  1  —  t  for  the  value  of  X  s.t.  conditional  quantile  line 
X'P{t)  equals  the  censoring  point  d  =  0.  It  is  above  1  —  r 
for  X  s.t.  X/3{t)  >  0,  and  below  -  if  XP{t)  <  0.  Once  a 
sample  is  s.t.  Xi/3(r)  >  0,  the  conditional  quantile  function 
of  the  uncensored  model  can  be  estimated  by  the  usual  linear 
QR.  Initial  step  involves  fitting  an  "envelope"  of  the  propensity 
score,  and  selecting  all  i,  s.t.  Xi  G  [51,51'].  Note  that  the 
"envelope"  is  not  ideal,  but  this  is  irrelevant,  since  it  acts  as  a 
good  separating  hyperplane  selecting  a  sub-set  of  z  s.t.  X[I3{t)  > 
0.  Next  Step  fits  the  QR  line,  which  is  used  to  select  all 
i  :  Xi  £  [52,52'].  The  Third  Step  uses  the  selected  sample, 
which  asymptotically  gets  close  to  the  ideal:  i  :  Xi  £  [0,52']. 
Note  that  h{X)  can  cross  the  1  —  r  line  only  once  at  X'P{t)  =  0. 
In  dimensions  greater  than  1,  X  G  M'',  the  crossing  points  are 
defined  by  the  hyperplane  X'P{t)  =  0,  which  have  a  zero  span 
in  W^ .  Thus,  in  W^  the  crossing  points  form  a  singularity  as  well. 
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h{x)  =  g{x'a)  for  the  Gumbel  c.d.f.  g.  A  much  more  flexible  CQR  model  is  obtained  by 
assuming  that  g{x'jo)  —  c  is  a  nontrivial  envelope  of  the  propensity  score  h{x). 

More  generally,  Theorem  1  replaces  the  intuitive  envelope  restriction  by  a  much  weaker 
separation  restriction,  which  requires  that  once  pix'-jo)  is  above  a  threshold  c,  h{x)  >  1  —  r. 
This  assumption  allows  the  envelope  to  be  an  incorrect  lower  bound  of  the  propensity  score, 
but  only  requires  that  it  does  a  good  job  at  selecting  a  correct  subset  of  observations.  Figure  1 
illustrates  the  situation.  For  well  behaved  models,  the  further  away  from  zero  the  conditional 
quantile  function,  the  further  away  from  1  —  r  is  the  propensity  score  function,  the  easier  it 
is  to  carry  out  the  classification.  Classification/discrimination  problems  of  this  kind  can  be 
subjected  to  the  standard  or  more  modern  classification  analysis  (e.g.  Breiman,  Friedman, 
Olshen,  and  Stone  (1984),  LeBlanc  and  Tibshirani  (1996),  Ripley  (1996),  Vapnik  (2000)). 
Therefore,  in  principle,  many  elaborate,  structured  strategies  for  the  first  classification  step 
are  available.  We  confine  our  discussion  to  the  simplest,  most  familiar  methods. 

The  justification  outlined  above  is  based  on  a  classical,  asymptotic  point  of  view.  An 
alternative  is  to  view  such  a  procedure  as  a  shrinkage  method  where  the  first  step  trades  bias 
of  approximating  a  propensity  score  envelope  for  smaller  variance.  We  also  argued  above 
that  due  to  the  special  nature  of  the  problem  the  bias  itself  can  be  quite  small.  Thus,  the 
shrinkage  aspect  may  be  of  primary  importance  in  moderate-sized  samples.  This  is  confirmed 
by  the  computational  experiments  discussed  next. 

More  generally,  from  a  real-life  (agnostic)  perspective,  either  the  linear  quantile  or  envelope 
models  are  approximations.  There  is  no  reason  to  believe  one  is  more  correct  than  the 
other,  once  both  are  reasonable  and  flexible  models.  Therefore,  we  hope  this  paper  off"ers  an 
organized  way  of  thinking  about  and  building  such  models. 

In  summary,  it  should  be  said  the  model  studied  here  is,  of  course,  somewhat  more  re- 
strictive than  the  canonical  Powell  CQR  model.  Yet  despite  being  more  restrictive,  the  em- 
phasized model  leaves  the  general,  plausible  features  of  the  CQR  intact.  This  model  is  also 
congenial  from  many  other  prospectives.  The  estimator  is  easily  computable  and  applicable 
to  such  examples  as  extramarital  data  (high  censoring,  very  large  sample,  many  categorical 
regressors)  or  Stanford  heart  data  (small  sample,  high  censoring,  categorical  regressors).  It 
does  well  in  Monte-Carlo  expriments  and  very  sensibly  in  real-life  examples.  We  believe  this 
estimator  will  help  procreate  the  presently  scarce  applications  of  the  CQR  models.  Appendix 
B  discusses  other  practical  aspects  of  estimation  and  inference. 


3.3.  Large  Sample  Properties.  The  following  assumptions  are  made  in  addition  to  equa- 
tions (1),  (5)-(7).  For  u^{t)  =  Y*  -  X-/3(t)  and  all  TjJ  <  J  of  interest 

Assumption  1.  (i)  {{X^,Y*,Ci)}    a'^e  i.i.d;  Ui{T)   has  density  /y^(^)(u|Xj),    which  is 

bounded  uniformly  in  Xi,  from  above,   away  from  zero,   and  continuous,  uniformly 
in  u  near  zero.   Ui{T)  has  the  unique  r-th  conditional  quantile  at  0.    The  support  of 
the  distribution  of  Xi,  X,  is  compact.  Xi  includes  a  constant. 
(ii)  Ht^[t)  =  E fy^^i^T.^{Q\Xi)XiX[l[h{Xi)  >  (1  —  r)  4-  77]  is  positive  definite,  for  a  fixed 
constant  rj  £  (0,  t). 
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(iii)    The  pair  of  the  binary  model  p  and  trimming  constant  c  form  a  nontrivial  envelope 
or  a  separating  hyperplane  of  the  propensity  score:  3c  >  0,t;  €  (0,  t), 

p(X-7)  >  (1  -  r)  +  c  implies  h[Xi)  >  (1  -  t)  +  w  a.s., 

for  any  7  in  a  small  neigborhood  of  jo  =  plimj.;  EXiX'^l{p{X'^jQ)  >  (1  —  t)  +  c}  is 
invertible.  p{-)  is  strictly  increasing  and  continuous. 
(iv)  P{X-a  >  v)  is  Lipshitz  in  a  uniformly  in  v,  for  a  in  an  open  neighborhood  0/70  or 
/3{t)  and  Xi  denoting  Xi  and  Xi,  respectively. 

Theorem  1.    Under  the  stated  assumptions,  as  6^  x  y/n  -^  cxd  and  Sn  i  0 
v^(/3/(r)-/3(r))  -^  N  {0,H^Hr)Ao{r)H^' {r)) 

for  finite  /  >  1,  where  Ao(t)  =  t(1  —  T)E{XiXll{h{Xi)  >  1  —  r}).  Furthermore,  the 
same  holds  if  any  other  consistent  initial  estimator  /3o(t)  is  used  in  step  2,  provided  se- 
quence (5„  4.  0  and  l/Sol"^)  —  /3('7")|/(5„  — >  0.  Furthermore,  the  joint  asymptotic  distribu- 
tion of  several  estimators  for  Ti,  i  <  J  is  asymptotically  normal,  with  covariance  given  by 
H^\r,)Ao{T„Tj)HQ'{T,),  where  Ao(r,r')  =  [(r  A  r')  -  Tr']E{Xa■^h{X^)  >  (1  -  r)  V  (1  - 
r')). 

Thus  the  second  part  of  the  theorem  allows  for  a  wide  range  of  alternative  initial  estimators 
Mr). 

Remark  1.  Our  proof  uses  an  approach  that  is  distinctly  simpler  than  those  in  the  cited 
literature  and  is  therefore  both  short  and  straightforward  to  understand. 

Remark  2.  Assumptions  (i)  and  (ii)  are  standard.  Assumption  (iii)  allows  the  parametric 
"probability"  model  p(x'7o)  to  be  moderately  misspecified.  The  degree  of  misspecification  is 
controlled  by  the  constant  c.  Assumption  (iii)  rationalizes  the  parametric  first  step,  as  we 
have  discussed.  It  requires  very  weak  smoothness  conditions  on  the  "probability  model"  p(-): 
continuity  and  strict  monotonicity. 

Remark  3.  No  assumptions  are  made  about  the  rate  of  convergence  of  7  to  its  probability 
limit  70  or  of  $q{t)  to  ^{t).  The  trimming  device  is  designed  to  eliminate  the  bias  and 
stochastic  equicontinuity  eliminates  the  impact  of  the  variance  of  the  preliminary  steps.  As- 
sumption (iv)  requires,  for  example,  the  distribution  of  X[a  to  respond  smoothly  to  changes 
in  a  in  the  vicinity  of  79.  The  Lipshitz  condition  can  be  replaced  by  the  weaker  Holder 
continuity. 

Remark  4-  It  is  useful  to  take  account  of  the  structural  risk  sourced  by  the  complexity  of 
the  envelope  or  separation  models  ip{x)  =  p(i(x)'7)  —  c  that  is  increasing  with  n,  e.g.a;„(x)'7„ 
may  be  a  power  series  or  regression  polynomial  spline  series.  The  result  is  preserved  as  long 
as  X  i->  p(in(x)'7„)  converges  uniformly  in  C""(X)  to  a  fixed  function,  r  >  dim(x)/2.  The 
latter  property  may  depend  on  particular  estimation  methods,  many  of  which  are  discussed 
in  Stone  (1985),  Cox  (1988),  Andrews  and  Whang  (1990),  Newey  (1997),Chen  and  Shen 
(1998)  and  references  therein.  For  brevity  we  do  not  reproduce  the  regularity  conditions  and 
particular  methods  of  these  references.  Note  that  the  resulting  class  of  "envelope"  models 
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is  Donsker  as  long  as  0  is  a  compact  set  in  C"^(X)  of  boundedly  differentiable  functions  of 
smoothness  order  r  >  dim(a;)/2  and  P{y:>{X)  >  c)  is  Lipshitz  in  ip  w.r.t  to  the  sup-norm  in 
£°°(X),  uniformly  in  c.  This  is  true  since  1/2 (-P)  bracketing  number  of  V  is  of  the  same  order 
as  that  for  ^  =  {x  h^  ip{x)-c,  c/j  E  0,  c  €  [0, 1]}  by  monotonicity  of  the  indicator  function  and 
Lipshitz  property.  The  bracketing  numbers  for  J^  are  given  in  example  19.9  in  van  der  Vaart 
(1998)  or  Corollary  2.7.4.  in  van  der  Vaart  and  Wellner  (1996).  Hence  if  r  >  dim(2;)/2,  the 
bracketing  entropy  integral  for  V  is  finite  and  Donsker  property  holds  in  view  of  a  constant 
envelope.  Furthermore,  Donsker  property  is  preserved  when  V  is  multiplied  by  a  random 
variable  with  a  constant  envelope,  as  required  in  the  proof. 

3.4.  Remarks  on  Robustness.  The  approach  pursued  here  enjoys  certain  robustness  prop- 
erties, because  it  allows  outliers  and  heavy  tails  for  the  dependent  variable.  The  quantile 
regression  estimator  of  Koenker  and  Bassett  (1978),  used  in  steps  2  and  3,  is  stable  under 
arbitrary  perturbations  of  Y  above  and  below  the  fitted  line.  This  property  is  inherited  from 
that  of  the  ordinary  sample  quantiles  and  can  be  also  advantageous  when  the  distribution 
of  Y  is  heavy-tailed  (e.g.  in  the  extramarital  example  the  Hill  estimate  of  the  tail  index 
implies  thick  tails).  In  such  cases  least-square  based  methods,  such  as  Buckley- James,  may 
suffer  a  great  deal.  The  first  step  is  also  robust  in  typical  implementations.  For  example, 
as  long  as  the  censoring  indicators  5i  are  unaffected  by  perturbations  in  Yj,  the  conditional 
MLE  estimates  are  stable.  Discriminant  analysis  will  enjoy  the  same  property.  Even  if  the 
values  of  8i  change,  there  have  to  be  sufficiently  many  changes  [0(n)]  to  distort  the  first  step 
significantly. 

3.5.  Remarks  on  Computation.  Computational  expense  of  the  estimator  is  comparable  to 
linear  least  squares  and  appears  to  be  of  orders  faster  than  computing  other  survival/censored 
regression  estimators  (based  on  our  comparisons  with  survreg  and  coxph  modules  in  S-I-).  Step 

1  is  easily  implemented  by,  for  example,  meiximizing  a  convex,  smooth  quasi-likelihood.  Steps 

2  and  higher  all  involve  quantile  regression  with  a  very  small  computational  expense  due  to 
Portnoy  and  Koenker  (1997),  whose  algorithm  is  based  on  the  interior  point  and  preprocess- 
ing ideas.  They  show  that  both  practical  and  theoretical  computational  times  are  of  the 
same  order  as  for  linear  least  squares.  QR  software  for  R  and  S-|-  environments  is  available 
from  statlib  or  http://www.econ.uiuc.edu/  ~roger/research/rq/rq.html.  Other  avail- 
able software  includes  QR  modules  in  STATA,  SAS,  Xplore. 

3.6.  Finite-Sample  Properties  and  the  Stanford  Example.  This  subsection  presents 
1)  a  graphical  bivariate  simulation  example,  2)  a  (standard)  test  of  sensibility  of  our  method 
on  the  well-known  Stanford  heart  data-set. 

3.6.1.  A  Simulation  Example  and  Concordance  Diagnostics.  For  the  sake  of  clear  visual  illus- 
tration, we  consider  the  case  of  median  regression  in  three  simple  bivariate  examples.  These 
examples  intend  to  clarify  the  workings  of  the  method.  They  also  help  devise  simple  diagnos- 
tics. In  each  of  the  three  models,  there  is  a  fixed  censoring  mechanism:  YJ  =  Y*l  {Y*  >  0). 
The  data  {Y*,Xi)  ,i  <  n  =  200  are  generated  as  follows:  Xi  =  (l,Xj)',  Xi  ~  A''(l,  1),  w,  are 
i.i.d.  and  mutually  independent,  u  ~  A''(0,  3),  /3(^)  =  (0, 1)'  and 
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.  A.  Y*  =X'p{^)+u, 
•  B.  Y*  =  X'p{h)  +  Xu, 


.  C.  Y*  =  X'/3(^)  +  u/X. 
By  construction,  X'f3{^)  is  the  conditional  median  function.  Model  A  is  a  standard  normal 
model,  while  B  and  C  are  heteroscedastic  normal  models.    42%,  38%  ,and  38%  of  the  200 
observations  are  censored  in  the  respective  data  sets,  which  isn't  atypical  for  a  lot  of  empirical 
applications. 

Figure  2  illustrates  our  procedure.  The  rows  correspond  to  models  A-C.  The  first  column 
plots  the  pre-censored  data  {Y*,Xi),  and  the  solid  line  shows  the  true  regression  function 
x'l3{^).  The  data  clouds  are  representative  of  the  models'  nature. 

A  logistic  probability  model  ^(^",'7)  is  estimated  by  maximizing  a  quasi-likelihood  function, 
using  5i  =  1  {Yi  >  0)  ,Xi  =  {Xi,Xf,Xfy.  In  the  second  column,  "+"  points  represent  {Yi,Xi) 
selected  by  the  logistic  envelope  using  p  (X-j)  >  1  —  J  +  c.  c  ?=:  0.1  is  chosen  such  that  about 
80%  of  the  observations  with  p{Xl'y)  >  1  —  J  are  selected.  The  "o"  points  depict  the  rest 
of  the  sample.  The  second  step  regression  quantile  estimator,  $o{^)^  is  obtained  by  applying 
the  Koenker-Bassett  procedure  to  the  selected  sample.  The  resulting  fit  x'Po  (r)  is  shown  by 
the  dashed  line  and  contrasted  with  the  solid  line  of  the  true  regression  function  x'/3o(^)- 

The  third  column  illustrates  the  third  step.  The  "+"  points  now  present  (yj,Xi)  selected 
using  Xl$o{^)  >  Sn,  where  Sn  ~  0.7  is  such  that  90%  of  the  observations  with  X'^Poi^)  >  0 
are  selected.  The  "o"  points  represent  the  rest  of  observations.  The  third  step  estimator, 
/3i(|),  uses  the  selected  sample,  and  the  fit  x'pi{^)  is  shown  by  the  dashed  line. 

These  figures  are  representative  of  the  principle,  and  they  seem  to  accord  well  with  the 
asymptotic  results  described  earlier.  The  logistic  classifier,  albeit  an  incorrect  model  for 
h{x),  does  well  in  separating  out  a  good  subset  of  observations  -  those  with  Xi  >  0.  The 
initial  inefficient  estimator  Po{^)  is  fairly  reasonable.  It  serves  as  a  classifier  for  the  next 
step  that  separates  a  larger  subset  of  good  obervations.  Thus,  the  last  step  estimator  uses 
a  progressively  larger  sample  and  pools  itself  closer  to  the  truth.  This  agrees  well  with  the 
Monte-Carlo  results  discussed  in  the  appendix. 

Finally,  the  last  column  plots  simple  diagnostics  that  we  found  helpful.  The  column  plots 
the  logistic  fit  p{x'j)  vs.  quantile  fits  x' Pq{t)  —  C,  (circles)  and  x'Pi{t)  —  Ci  (triangles). 
The  idea  is  to  account  for  the  concordance  of  different  classifiers  in  choosing  the  trimming 
constants  and  deciding  on  the  sensibility  of  the  estimates.  Each  of  the  classifiers  tries  to 
separate  out  a  set  of  observations  for  which  Xj'/3(t)  >  Ci  or  h{Xi)  >  1  —  t.  As  a  conservative 
classifier,  the  first  step  p{x''y),  with  the  addition  of  trimming,  should  be  seen  as  one  selecting 
a  smaller  set  of  observations.  The  subsequent  classifiers  should  confirm  all  or  almost  all  of 
this  initial  set. 

The  classifier  concordance  is  presented  by  the  placement  of  points  in  diff'erent  quadrants  of 
the  plots  A.4-C.4.  For  example,  on  the  plot  B.4,  a  large  proportion  of  observations  for  which 
p{X-^)  >  1  —  T  is  confirmed  by  the  quantile  classifiers,  as  seen  in  the  upper-right  quadrant. 
However,  those  points  in  the  right-bottom  corner,  are  dis-confirmed.  Nonetheless,  most  of 
these  are  trimmed  out  by  the  trimming  hurdle  c,  ^(^^'7)  >  1  —  r  -I-  c.  Hence  the  discordance 
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"from-the-right"  is  reduced  or  eliminated  by  such  trimming.  The  left-bottom  corners  of  A.4- 
B.4  represent  the  points  agreeably  disqualified  by  both  the  logistic  and  quantile  classifiers. 

The  upper-left  corners  represent  the  discordance  of  the  logistic  and  quantile  classifier  "from- 
the-left".  In  principle,  this  type  of  dissonance  is  not  as  pernicious  as  the  one  "from-the- 
right"  ,  since  quantile  classifier  is  meant  to  asymptotically  separate  a  larger,  better  subset  of 
observations.  However,  we  find  it  prudent  to  reduce  discordance  "from-the-left"  by  adding 
the  hurdle  (5„.  For  example,  on  the  plot  A. 4  the  initial  quantile  classifier  is  overly  optimistic 
in  selecting  a  superset,  rather  than  a  subset  of  {i  :  X^  >  0}.  The  trimming  factor  6n  reduces 
the  discordance,  and  helps  select  a  more  appropriate  set.  In  practice,  since  both  the  envelope 
and  the  quantile  models  are  approximate  models  of  reality,  it  appears  prudent  to  reduce  the 
discordance  "from-the-left"  as  well. 

Overall,  if  the  concordance  plots  reveal  a  very  drastic  disagreement,  it  should  serve  as  a 
sensible  warning  to  revise  the  trimming  constants,  envelope  models,  or  the  models  in  question 
all  at  once. 
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Figure  2. 


3.6.2.  The  Stanford  Example.  The  section  considers  a  well-known  Stanford  heart  transplant 
data  set.  The  dataset  is  built  into  S-l-  (heart).  Out  of  the  69  patients  who  received  heart 
transplants,  45  had  died  by  the  closing  date  and  are  uncensored.  Thus  35%  of  the  observations 
are  censored.  The  censoring  times  for  the  post-transplant  survival  are  random  observable. 
We  shall  contrast  our  results  with  the  well-known  and  thorough  study  of  Aitkin,  Laird,  and 
Francis  (1983),  and  follow  their  definition  of  variables: 

•  (log)  survival  time,  dependent  variable:  the  log  of  the  difference  between  the  death  and 
the  transplant  time.  Censoring  times  are  log  differences  between  the  last  follow-up  and  the 
transplant  dates. 

•  Age:  Age  of  the  patient  at  the  time  of  acceptance  into  the  program. 
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•  Acc:  Years  since  January  1,  1967  to  acceptance  into  the  program.  This  regressor  may 
be  seen  as  representing  a  technological  progress. 

•  Surgery:  indicator  of  a  previous  open  heart  surgery. 
We  estimate  the  CQR  model: 

Qy\x,C  {r)  =  (a  (r)  +  X'O  (r))  A  C,  (13) 

where  C  denotes  the  observable  random  censoring  time.  A  benchmark  model  from  survival 
analysis  studied  in  Aitkin,  Laird,  and  Francis  (1983)  is  the  accelerated  failure  time  model: 

QY\Xfi{r)  =  {a  +  aF-'[T)  +  X'e)^C,     Vr,  (14) 

where  F~^  (r)  is  the  inverse  of  (a)  the  extreme  value  (Gumbel)  or  (b)  the  standard  normal 
distribution  function.  Model  (a)  is  thus  a  proportional  hazard  (PH)  model  with  Weibull 
baseline  hazard  and  model  (b)  -  a  non-PH  AFT  model.  Among  many  models  considered  in 
Aitkin,  Laird,  and  Francis  (1983),  these  seemed  to  fit  best.  For  comparisons,  we  shall  use  the 
estimates  of  ^  in  model  (14)  reported  in  Aitkin,  Laird,  and  Francis  (1983),  table  5.  As  we 
discussed,  another  way  to  robustly  estimate  9  in  model  (14)  is  to  take  any  6{t),  say  median, 
or  average  over  several  quantile  regression  estimates  9[t)  with  different  indices  t. 

Because  there  are  only  few  un-censored  observations  on  patients  with  prior  surgery,  we 
could  estimate  the  CQR  model  with  the  surgery  regressor  only  up  to  the  50%th  percentile. 
To  discuss  higher  quantiles,  we  also  estimated  the  CQR  model  without  the  surgery  regressor. 

Figure  4  presents  the  results.  The  first  four  plots  [read  by  rows]  show  the  estimates  of 
intercept  coefficients,  a{T),  and  of  slope  coefficients,  9{t),  for  r  ranging  from  .1  to  .5,  for 
the  model  with  age,  acc  and  surgery  as  regressors.  The  shaded  areas  represent  the  pointwise 
80%  confidence  intervals.  The  bottom  three  figures  plot  the  quantile  coefficient  estimates  for 
the  model  with  age  and  acc  as  regressors.  The  dotted  lines  present  the  Aitkin,  Laird,  and 
Francis  (1983)  estimates  of  quantile  treatment/shift  effects,  9,  in  the  model  (14)  with  Weibull 
hazards  (coefficients  of  PH  model  divided  by  the  shape  coefficient).  The  dashed  lines  are  the 
estimates  of  quantile  shift  effect,  ^,  in  the  normal  version  of  (14).  Note  that  the  lines  are 
horizontal  since  location-shift  models  impose  constant  treatment  effects  across  quantiles. 

Comparing  the  quantile  shift  estimates  9{t)  in  the  CQR  model  with  those  in  the  models 
(14),  9,  across  r,  we  observe  much  qualitative  and  quantitative  similarity.  At  the  median,  for 
example,  the  technological  effects  (acc)  are  both  small  and  insignificant.  The  effects  of  age 
are  both  negative  and  significant,  and  the  effects  of  previous  surgery  are  both  positive  and 
significant.  In  other  words,  prior  surgery  and  younger  age  seem  to  prolong  the  post-transplant 
survival.  Overall,  it  is  an  encouraging  fact  that  the  3-step  CQR  estimator  produces  results 
which  compare  well  with  a  well-known,  high-quality  study. 

Furthermore,  as  the  CQR  model  (15)  nests  model  (17)  as  a  special  case,  we  may  confirm 
the  general  validity  of  the  models  considered  in  Aitkin,  Laird,  and  Francis  (1983),  at  least 
concerning  the  quantile  treatment  effects.  In  particular,  the  effects  of  prior  surgery  appear 
to  be  constant  across  all  estimated  quantiles.  The  effect  of  the  time  between  January  1, 
1967  to  acceptance  of  the  program  is  small  and  insignificant  at  all  quantiles.  However, 
we  should  point  out  that  the  age  effects  differ  across  quantiles.    For  low  survival  quantiles 
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the  age  treatment  effects  are  positive,  small,  and  statistically  insignificant.  For  middle  and 
higher  survival  quantiles  the  age  efi"ects  are  negative  and  significant.  Yet  this  variation  does 
not  appear  to  (statistically)  negate  the  location-shift  models  of  Aitkin,  Laird,  and  Francis 
(1983).  Of  course  these  findings  warrant  further  careful  examination  of  the  age  effects  once 
more  data  becomes  publicly  available.  Note  that  all  of  these  quantitative  and  qualitative 
conclusions  are  well  sustained  when  the  surgery  indicator  is  omitted  from  regression.  See  the 
bottom  row  in  Figure  4. 

Finally,  the  agreement  of  classifiers  seems  to  be  fairly  high,  as  presented  in  the  last  two 
columns  of  the  second  row  in  figure  4.  We  plotted  the  logistic  fit  p  (X'^j)  against  the  quantile 
fit  Ci  —  X[p\  (r).  There  are  only  a  few  discordant  observations  "from-the-right"  or  "from- 
the-left" ,  almost  all  of  which  are  eliminated  by  trimming. 
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Figure  3. 


4.  Determinants  of  Extramarital  Affairs:  A  CQR  Analysis 

Extramarital  affairs,  an  important  social  phenomenon,  received  much  attention  by  an- 
thropologists, psychologists,  evolutionary  biologists,  sociologists,  and  economists.  See  Cronk 
(1991),  DeLamater  (1981),  Fair  (1978),  Miller  and  Klein  (1981),  South  and  Lloyd  (1995), 
Reiss,  Anderson,  and  Sponaugle  (1980)  and  many  references  therein.  We  present  a  retro- 
spective analysis  of  the  Redbook  data-set  on  extramarital  affairs.  We  shall  mainly  contrast , 
our  analysis  with  both  the  data  and  the  model-analytic  findings  of  Fair  (1978).  Fair  (1978) 
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presents  a  utility-based  optimization  model  of  the  time  spent  in  the  affair  (aifair  intensity) 
as  determined  by  preference  for  diversity,  value  of  goods  consumed  in  and  outside  marriage, 
labor  and  non-labor  income,  and  time  already  spent  with  the  spouse  and  paramour. 

4.1.  Data.  The  dataset  has  been  collected  by  the  Redbook  magazine.  Fair  (1978)  and  DeLa- 
mater  (1981)  describe  the  collection  procedures  as  well  as  the  place  of  this  data-set  among 
only  few  similar  data-sets.  The  data-set  covers  6388  first-time  married  women,  of  which 
68.5%  reported  to  have  had  no  extramarital  affairs.  This  presents  a  very  high  degree  of 
"censoring".  We  define  all  the  variables  as  in  Fair  (1978)  in  order  to  facilitate  comparisons 
with  his  statistical  and  model-analytic  findings: 

•  Intensity  of  Affaire,  dependent  variable,  defined  as  the  number  of  different  partners  outside 
marriage  times  the  approximate  number  of  relationships  with  each  partner,  divided  by  the 
number  of  years  in  marriage.  For  68.5%  of  respondents,  it  is  equal  to  zero.  For  the  rest 
of  respondents,  the  density  function  is  sketched  by  the  histogram  and  a  kernel  estimator  in 
figure  4. 

A-ffsIr  Intensitv 


Figure  4. 

Simple  histograms  of  the  following  regressors  are  given  in  figure  5: 

•  Marriage  Rating:  respondents'  rating  of  their  marriage,  on  the  scale  from  1  to  5. 

•  Age,  Years  Married,  No.  of  Children 

•  Religiousity:  respondents'  rating  of  their  religiosity,  on  the  scale  from  1  to  4. 

•  Education:  9.0,  12.0,  14.0:  grade  school,  high  school,  and  some  college;  16.0,  17.0,  20.0: 
college  graduate,  some  graduate  school,  and  advanced  degree. 

•  Occupation,  Husband's  occupation:  Hollingshead's  socio-economic  status  of  occupation: 
6:  professional  with  advanced  degree,  5:  managerial,  administrative,  business,  3:  teacher, 
artist,  etc.  3:  white  collar  (administrative,  clerical),  2:  blue  colar  (farming,  factory),  1: 
student. 


4.2.  Models.  The  CQR  model  assumes  the  following  form  (with  C,  =  0): 

QYlx{T)  =  {air)  +  X'_,e{T))yO. 


(15) 


That  is,  the  conditional  quantile  function  of  the  affair  level  is  either  zero  or  linear.    This 
functional  form  is  appealing,  as  we  have  discussed.    We  also  consider  a  standard  normal 
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model: 

QY\x{r)  =  {a  +  a<^-\T)+X'_,e)VO,      Vr,  (16) 

where  #~^(r)  is  the  inverse  of  the  standard  normal  distribution.  Another  benchmark  model 
is  the  accelerated  failure  time  model  (for  exp(y))  from  survival  analysis: 

QY\x{r)  =  {a  +  (jF'\T)  +  X'_,e)WO,     Vr,  (17) 

where  F  is  unspecified  distribution  function.  It  is  easy  to  estimate  the  quantile  shift  effects 
9  in  this  model  by  taking  or  averaging  any  of  the  estimates  of  6(t)  in  the  CQR  model.  This 
will  not  be  necessary,  since  neither  this  nor  the  normal  model  are  supported  by  the  data. 

4.3.  Estimation  and  Model  Comparisons.  To  construct  the  initial  envelope/classifier 
we  examined  the  pairwise-plots  of  Y  vs  X.  Many  of  covariates  appeared  to  be  associated 
with  higher  dispersion  of  Y,  which  lead  us  to  consider  a  number  of  polynomial  powers  in 
the  logistic  model  p{x'"f):  x  consisted  of  xu^,  x?  ^,  x?-,,  and  certain  interactions  X(j)X(j)  that 
appeared  to  significantly  improve  the  fit.  Dimension  of  i  was  18,  which  is  plausible  in  view  of 
the  large  sample  size.  Sensitivity  of  the  final  estimates  to  further  increasing  the  complexity 
of  the  envelope  was  negligible.  The  trimming  constant  c  was  set  to  c  si  .1  according  to  the 
rule  described  in  the  appendix,  -y  was  estimated  by  conditional  MLE. 

Due  to  heavy  censoring  it  was  not  possible  to  estimate  all  quantile  coefficients  6{t).  Iden- 
tification depended  on  the  nondegeneracy  of  the  selected  design  matrix.  This  condition 
prevented  considering  quantiles  lower  than  .5. 

The  results  are  summarized  graphically  in  Figure  6.  The  solid  line  denotes  the  3-step 
estimates  oi  (3{t)  =  (Q(r),0(T)')',  t  G  {.4, ...,  .9},  and  the  shaded  region  depicts  the  pointwise 
90%  confidence  intervals.  The  dashed  line  presents  the  MLE  estimates  6  of  the  quantile  shift 
effects  in  the  normal  model  (16)  obtained  by  Fair  (1978). 

6{t)  significantly  vary  across  quantiles,  especially  at  higher  ones.  This  presents  an  evident 
violation  of  homoskedasticity  assumption.  Therefore,  either  model  (16)  or  (17)  are  strongly 
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unsupportive  of  the  data.  It  is  still  interesting  to  briefly  comment  on  the  behavior  of  ML 
estimates  of  the  normal  model.  It  is  well  known  that  the  estimates  are  not  robust  to  violations 
of  both  normality  (e.g.  heavy  tails)  and  homoskedasticity  ~  see  Goldberger  (1983),  Hurd 
(1979)  for  proofs  and  simulation  studies.  In  our  example,  we  have  both.  It  is  interesting  that 
for  five  out  of  eight  variables,  the  estimates  6j  seem  to  correspond  to  fairly  extreme  quantile 
estimates  6j{t),t  ~  .9.  For  the  marriage  longevity  variable,  on  the  other  hand,  the  estimate 
Oj  is  far  away  from  any  of  djir).  Furthermore,  in  several  cases  sign  of  ^j(t)  changes  across 
T,  so  6j,  understandably,  can  not  even  match  the  direction  of  the  quantile  shift  effect,  since 
if  the  normal  model  (16)  or  location-shift  model  (17)  were  adequate,  it  would  have  been  the 
case  that  9{t)  ks  9  for  all  r.  Thus,  dj  can  hardly  be  given  any  meaning  in  the  present  setting. 
This  finding  is  an  empirical  illustration  to  the  earlier  discussion  on  the  breadth  and  flexibility 
of  the  CQR  model.  ^     - 

Finally,  the  last  raw  in  Figure  6  presents  the  concordance  plots.  Overall,  the  concordance 
appears  to  be  good.  Only  fairly  small  proportion  of  observations  is  discordant  "from-the-left", 
a  major  chunk  of  which  is  eliminated  by  the  additional  trimming.  Discordance  "from-the- 
riglit"  is  also  mildly  present,  and  part  of  it  is  also  eliminated  by  trimming. 

4.4.  Analysis.  The  most  exciting  matter  is  the  interpretation  of  the  estimated  quantile  shift 
effects  6{t)  (cf.  Figure  6). 

Religiosity  effect  is  expectedly  negative  at  all  quantiles  of  affair  intensity  and  especially 
strong  at  very  high  quantiles.  As  sociologists  emphasized,  the  institutions  and  norms  of 
Judeo-Christian  doctrine  have  had  a  major  influence  on  American  families,  endorsing  a  pro- 
creational  and  strictly-within-the-marriage  orientation  towards  sexuality  (e.g.  DeLamater 
(1981)).  This  thesis  is  well  supported  by  the  data. 

Education  quantile  shift  effects,  on  the  other  hand,  are  more  engaging.  The  effects  are 
negative  and  strongly  negative  at  high  quantiles.  Note  that  education  and  religiosity  are 
weakly  correlated  (.14),  but  since  we  condition  on  religiosity,  the  education  effects  are  net 
of  this  and  other  factors.  Education  effects  are  inexplicable  within  the  Fair's  model,  yet 
they  appear  to  have  a  clear  meaning  in  view  of  the  relational  (rather  than  recreational) 
perspectives  towards  a  paramour  among  the  more  intelligent,  educated,  and  not  necessarily 
religious  individuals  (Reiss  (1980),  DeLamater  (1981)). 

The  quantile  treatment  effects  for  age  are  nonpositive  across  all  presented  quantiles.  Age 
effects  are  negative  at  the  middle  quantile  and  strongly  negative  at  high  quantiles.  This 
means  that  the  younger  respondents  are  more  likely  to  engage  in  an  affair,  especially  in  very 
intense  ones,  holding  everything  else  fixed.  In  Fair's  analytic  model,  diversity  considerations 
("variety  is  a  spice  of  life")  are  incorporated  in  the  utility  but  does  not  shed  much  light  on 
the  life-cycle  considerations.  On  the  other  hand,  the  elaboration  by  DeLamater  (1981)  on 
the  life-cycle  perspective  dynamics,  within  the  social  institutions  and  norms,  conforms  the 
present  data-analytic  findings. 

Women  with  an  occupation  of  higher  socio-economic  status  are  relatively  more  likely  to 
engage  in  affairs,  especially  more  intense  ones.  Explanations  for  this  are  to  be  looked  at. 
One  view  is  that  such  status  creates  an  interactional  advantage,  increasing  the  hazard  of  an 
affair  and  subsequent  marital  dissolution  (South  and  Lloyd  1995).  Fair's  analytic  model  does 
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not  necessarily  yield  predictions  about  the  direction  of  the  status  effect  (because  he  treats 
the  status  as  proxy  for  labor  income).  To  the  extent  that  higher  status  is  associated  with 
non-labor  income,  or  to  the  degree  that  income  effects  dominate  substitution  effects,  Fair's 
model  may  predict  a  positive  effect. 

Husband's  occupational  status  has  very  small  positive  or  negligible  effect  across  almost  all 
quantiles,  except  at  very  extreme  ones,  where  it  becomes  very  negative  (it  is  positive  but 
insignificant  at  .95).    Besides  an  (anecdotal)  explanation  that  good  men  pick  good  wives, 
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women  may  value  statusful  husbands  and  pursue  affairs,  if  at  all,  optimally  to  keep  the  risk 
of  marital  dissolution  optimally  low.  On  the  other  hand,  Fair's  analytic  model  predicts  the 
positive  effects  of  a  husband's  status  (income)  on  the  affair  level,  as  a  higher  value  of  goods 
consumed  in  marriage  causes  wives  to  substitute  labor  activities  for  time  spent  with  family 
and  paramour.  The  Fair  model,  however,  ignores  the  negative  value  of  the  dissolution  option; 
which  is  real  as  other  studies  point  out  (South  and  Lloyd  (1995)).  Our  quantile  regression 
results  suggest  that  Fair's  model  explains  only  the  middle  quantiles,  but  does  not  apply  to 
the  high  quantiles.  It  seems  plausible  that  incorporation  of  the  dissolution  risk  in  Fair's 
model,  more  along  the  lines  of  the  Becker  (1968)  crime  and  punishment  model,  can  make  it 
conform  to  the  present  findings.  Finally,  it  is  also  not  implausible  that  more  intelligent  or 
statusful  husbands  can  have  better  unobserved  detection  rates  and  the  observed  effect  is  the 
interaction  of  this  hidden  detection  ability  with  the  "cheating  ability". 

The  effect  of  marriage  longevity  is  slightly  positive  at  .6-. 8  quantiles  and  strongly  negative 
at  high  quantiles.  Fair,  using  behavioral  considerations,  postulates  that  marriage  longevity 
may  positively  relate  to  diversity  quest,  leading  to  an  increased  affair  level.  However,  it  is 
not  entirely  clear  why  the  effect  is  very  negative  at  high  quantiles.  This  may  relate  to  the 
fact  that  only  married  and  undivorced  respondents  were  selected  to  the  sample,  so  that  the 
marriage  longevity  correlates  with  the  marriage  match  quality  and  thus  has  a  deleterious 
effect  on  affairs.  Such  an  outcome  would  be  a  clear  prediction  of  the  search  (for  spousal 
alternatives)  theory.  Our  finding  thus  partially  disconfirms  both  the  analytic  and  statistical 
predictions  of  Fair  (1978)  derived  from  the  normal  model. 

5.  Discussion 

This  paper  had  three  goals.  The  first  was  to  offer  both  theoretical  and  empirical  per- 
spectives on  the  global  CQR  models  as  a  useful  way  to  approximate  the  quantile  treat- 
ment/shift effects  in  censored  regression  settings.  The  treatment  variety  is  richer  than  the 
simple  location-shift  effects  assumed  by  many  commonly  used  models.  In  the  CQR  models, 
the  covariates  and  the  treatment  can  affect  such  features  as  scale,  skewness,  kurtosis,  or, 
generally,  the  entire  shape  of  conditional  distribution. 

The  second  goal  was  to  offer  an  empirical  CQR  analysis  of  the  determinants  of  a  very 
serious  social  phenomenon  -  extramarital  affairs.  This  is  an  important  topic  within  sociol- 
ogy, psychology,  and  economics  of  marriage  and  family.  To  our  regret,  we  found  no  previous 
implementable  estimators  that  can  be  used  in  the  settings  of  heavy  censoring,  many  poly- 
chotomous  or  continuous  regressors,  and  large  or  small  samples.  Such  data  sets  seem  to 
prevail  in  many  areas  of  applied  statistics.  This  justified  the  pursuit  of  a  practical,  imple- 
mentable, well-behaved  estimator.  The  suggested  estimator  can  be  used  to  robustly  estimate 
the  global  CQR  models  as  well  as  many  traditional  models.  Studying  this  estimator  in  large 
and  finite  samples,  in  the  well-known  Stanford  heart  transplant  data-set,  and  developing  the 
diagnostic  tools,  was  the  third  goal. 
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Appendix 

Below,  C,  const,  and  K  are  generic  positive  constants.  For  notation  sake,  we  set  d  =  0.  General  case  is  very 
similar. 


Appendix  A.    Proof  of  Theorem  1. 
Part  1.  First  consider  ffo  (t).  Rescaled  statistic  Z°  =  y/Ti{po  (r)  —  0  {t))  minimizes 

Qn  [z,  1)  =  4=  Z  V'i„(z)l[p(X.'7)  >  1  -  T  +  c],       where 

Vi„{z)  =  y/n[pT(ei  —  X[z/y/n)  —  PrCft)]  and  e,  =Yi  —  Xl0  (r)  =  max  (a;,  —X[f3  (r)) .  The  claim  is,  for  finite 
k: 

{Qn{zjn),      j<fc)^(Qoc(2,),      j<k),       where  (18) 

Qoc  (z)  =  W'z+\z'jz,      W  =  N{0,A),J  =  Ef^iO\X,)XiX',l[p{X',yo)  >  1-t+c],A  =  T{l-T)EX,X',l\p{X'no)  > 
1  —  T  +  c].  J  is  invertible  by  conditions  (ii)  and  (iv).  Since  Qn  and  Qoo  are  convex,  finite,  and  continuous  in 
2,  and  since  Qoo  is  uniquely  minimized  at  —J~^W  =  Op  (1),  (18)  implies 

by  the  convexity  theorem  (Knight  (1999),  Theorem  5,  also  see  Pollard  (1991)).  But  if  7  =  70,  (18)  follows  by 
LLN,  CLT,  and  some  standard  calculations,  so  it  remains  only  to  verify  that 

Qn  (2, 7)  —  Qn  (z,  70)  -^  0     for  any  fixed  2.  (19) 

(a)  For  any  fixed  2,  {Qn  (2,7)  —  EQn  (2,7)  ,7  €  5}  is  stochastically  equicontinuous  in  7,  where  G  =  {-y  : 
\l  —  7o|  <  <5}  and  5  >  0  is  small.  Indeed,  Type  I  functions  (bounded  variation  functions  of  a  single  index,  a 
VC  subgraph  class)  in  Andrews  (1994)  include  the  class  ^  =  {x  >->  l\p{x~f)  >  1  —  r  +  c],  7  £  5}  ,  hence  by 
Theorem  2  of  Andrews  (1994)  it  satisfies  Pollard's  entropy  condition  with  a  constant  envelope.  This  property 
is  retained  by  the  product  of  T  with  random  variable  Vin(2),  Vin(z)  ®  J^,  by  Theorem  3  in  Andrews  (1994).^, 
since  by  assumption  (i)  |Vin(2)|  has  a  constant  envelope; 

\V,n{z)\  <    2\X,z\  <   const  (20) 

Hence  (a)  is  verified  by  Theorem  1  in  Andrews  (1994). 
(b)  Space  Q  with  pseudo-metric 

p(7i,72)  =  sup      E\v,n(z)  X  [l{p(X'ni)  >  1  -  r  +  c}  -  l{p(A':72)  >  1  -  r  +  c}]f 

<  C  X  \p{X'a,  >p''[l-r  +  c])  -  P(X.'7i  >  p-'[l  -  r  +  c],Xa2  >p-'[l-T  +  c]) 

(21) 

+  P{X'n2  >p"'[l-r  +  c])  -P{Xhi  >p-'[l-r-|-c],A:;72  >  p"'[l  -  r  +  c])  | 

<  const     X      72  —  71 


®We  note  that  Andrews  allows  for  triangular  sequences,  which  allows  the  r.v  Vin{z)  to  be  indexed  by  n. 
Otherwise,  we  would  have  to  refer  to  van  der  Vaaxt  and  Wellner  (1996)  2.11.3  for  dealing  with  functional 
classes  indexed  by  n.  Note  that  convexity  allows  us  to  treat  Vin{z)  as  r.v.  and  not  a  function  of  z,  in  the 
sense  that  2  is  fixed  in  the  above  arguments. 
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is  totally  bounded,  where  the  first  inequality  follows  from  (20),  and  the  second  one  follows  from  assumption 
(iv)  by  bounding  each  of  the  two  terms,  for  example; 

|p(.Y;7i  >p"'[1  -r  +  c])  -P(X'7i  >P"'ll  -T^-c\,X['y2  >p~'[l-r  +  c])  | 

<  |p(a';7i  >p-'[l-r  +  c])  -P{x[i,  >p-'[l-r  +  c],A-:7i+A':(72-7i)  >  p"'[l  "  ^  +  c])  | 

<  |p(a:,'7i  >p"'[l-r  +  c])  -p(a','7i  -  A' x  H^, -71II2  >  p"'[l  -  r  +  c])  | 

<  C^i\  -  72II2, 

where  \X[  (71  —  72)  |  <  A'||7i  —  72II2,  since  A,-  has  a  compact  support  X. 

(a)  and  (b)  together  implies  that  (  e.g.  Andrews  (1994)  equations  3.34  and  3.36  ): 

sup         \Qn  (2,7)  -  Qn  (2,70)  -  EQn  (2,7)  +  EQn  (z,7o)  |    =  Op  ( 1 )  . 
l7-70|->0 

Thus,  to  complete  the  proof  of  (19),  it  remains  only  to  show  that 

\EQn  (2,  7)  -  EQn  (2,  70)  1^^  -   =  Op  (1) .  (22) 

We  will  show  that  for  Si(7,7o)  =  l[p(A,'7)  >  1  -  t  +  c]  -  l[p(A,'7o)  >  1  -  r  +  c]: 

EQn  (2,7)  -  EQn  (2,70)  U=^  =  V^EV.n  (2)  6.(7, 70)^=^  =  Op(7  -  70),  (23) 

Write  y/nVin  (2)  =  -^[{r  -  \[u  <  0]}Xlz]  +  v^[  -  ?;=(2){A,'2  -  e,V^}]  =  ^VU^)  +  V^V^n(z), 
where  rii{z)  =  [l(ei  <  0)  —  l(ei  <  Xlz/^/n)].  Set  7  close  enough  to  70.  Then  p(A,'7)  >  (1  —  r)  +  c 
implies  h{Xi)  >  (1  —  r)  +  v' ,  and  so  does  p(A,'7o)  >  (1  —  r)  +  c,  by  assumption  (i)  and  (iii).  Hence 
Si(7,7o)  /  0  necessarily  implies  h{Xi)  >  (1  —  r)  +  v' ,  which  implies  A,'/3(t)  >  v" ,  a.s.  for  v' ,v"  >  0  small, 
for  all  i.  Hence,  as  7  gets  close  to  70 

E[^^V/„(2)s,(7,7o)|A,]  = 

(24) 
^[\A^V/„(2)l(A.'^(r)  >  v")\X,]  X  s,(7,7o)  =  0,   uniformly  m  t, 

since  P[e,  <  0|A,,  A',/3(r)  >  t;"]  =  r  [  if  Ai/3(r)  >  0,  €,  =  max  (-u^,  — A','/?  (r))  has  r-th  conditional  quantile  at 

0].  A\so  E[V^v-U^)s,('rno)\Xi] 

=  £[^^V/:(2)l(X.'^(r)  >  ^")1A.]  X  s,(7,7o) 

(25) 
=  O[/„(0|A,)2'AiX;2l(A;/3(r)  >  v")]  x  5,(7,70),  uniformly  in  i 

[the  second  line  follows  by  the  standard  calculations].  Therefore  by  assumptions  (iii)  and  Lipshitz  condition 
(iv): 

i?i^[^/^K/,;(2)s.(7,7o)|A.]  =0(£[s,(7, 70)])  =0(7-70).  (26) 

(24)  and  (26)  imphes  (22). 

Part  2.  It  suffices  to  show  the  result  for  /3i(r)  with  Po{t)  as  the  selector.  The  proof  for  Pi{t),I  >  1 
is  identical.  Proof  for  part  2  is  similar  to  that  of  part  1,  so  only  important  differences  are  given.  Notation 
undefined  here  is  in  part  I. 

Rescaled  statistic  Zl  =  y/ni^jii  (r)  —  /3(t))  minimizes 


1       " 

Qn{z,Po{.T),5n)  =   -yr^Y  Vin{z)\[X[Po{T)  >  5n\. 


Consider  ^„  as  a  parameter  sequence.  Proceed  identically  35  in  part  1  up  to  equation  (19),  only  replacing 
1[p(A:7)  >  1  -  t  -  c]  by  1[X%{t)  >  5n\,  ^\p{X[^)  >  1  -  r  -  c]  by  1  [X[P{t)  >  0]  and  W,  J,AhyW  = 
N{0,Hg^AoH^^),  Ho,  Aq.  It  remains  to  show 

Qn{zJ{T),5n)  -  Qn{z,l3{T),0)  -^  0,  for  any  fixed  2.  (27) 

(a)  For  any  fixed  z,  {Qn  {z,/3,5)  —  EQn  {z,f3,S) ,  (/3,5)  G  B  x  V}  is  stochastically  equicontinuous  in  (3,6, 
where  B  =  {(3  :  \l3  -  ^(r)|  <  C"},  V  =  {5  :  0  <  5  <  C"}  and  C',C"  >  0  axe  small.  Indeed,  Type  I  functions 
in  Andrews  (1994)  (VC  subgraph  classes)  contain  T  =  {x  ^  l[i'/?  >  6],       {(3,6)  6  S  x  V}.  Hence  T  has 
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a  finite  uniform  covering  entropy  integral  and  a  constant  envelope,  i.e.    satisfies  Pollard'  entropy  condition. 
This  property  is  retained  by  the  product  of  space  T  with  random  variable  V,n(2):   ^in{z)  ®T,  by  Theorem  3 
in  Andrews  (1994),  since  |V',n(2)|  has  a  constant  envelope,  (a)  is  verified  by  Theorem  1  in  Andrews  (1994). 
(b)  Space  S  x  P  is  totally  bounded  under  the  L2  pseudometric: 

p((/3i,5i),(/32,<52))ssup      f£|V'.„(z)x  [l{A','/?i  >  5i}  -  If.Y.'ft  >  ^2}]  H 

^  ^  (28) 

<   const     X    ||;92  — /3i  II,  +  const     x    ||52— (5i||.,, 

where  (28)  follows  from  (20)  and  from  assumption  (iv),  analogously  to  the  proof  of  (21),  treating  5  as  shifting 
the  intercept  parameter. 

(b),  along  with  (a),  implies  (  e.g.  Andrews  (1994)  eq  (3.34)  and  (3.36)  ): 


sup 

|/3-/3(r)|-tO 


Qn{z,P,K)  -  Qn{z,l3{T),Q)-EQn{z,P,5n)  +  £Q„  (z,  ^(t)  ,  0) 

Thus,  to  complete  the  proof  of  (27),  it  remains  to  show  that 


Op(l). 


|£Q4z,/3,5„)-£Q„(2,/?(r),0)|^^^^   =Op(l).  (29) 

Let  5,(/3,/3(r))  =  \{X[P  >  Sn)  -  l(.Y,'/3(r)  >  0).  By  assumption  on  the  sequence  5n,  w.p.  — >  1,  Po{t)  is 
inside  the  ball  with  radius  n'Sn,  centered  at  P{t),  where  k'  >  0  is  small.  By  the  compactness  assumption  on 
Xi,  k'  can  be  chosen  so  that  w.p.  — )•  1,  sup^g-^  \^'{Po{t)  —  0  {t))\  <  ^6n-  So  set  /3  inside  this  ball.  Then 
for  small  enough  k  chosen  as  such,  x' (3  >  S„  implies  x' /3  (t)  >  0,  and  x' (i  {t)  <  0  implies  x' P  <  Sn-  Thus 
Si{j3,  P{r))  7^  0  necessarily  implies  X,'/3(r)  >  0  a.s..  Hence,  uniformly  in  i, 

E[V^Vt{z)s,{l3,0{T))\X,]  =  E[J^VU{z)\{X[P{t)  >  0)\X,]  x  s,(0,0iT))  =  0,  (30) 

smce  P[e,  <  0|X,-,  X,/3(r)  >  O]  =  r.  Also  E[y/fiVZiz)s,{l3,p{T))\X.]  = 

=  i?[^^C(z)l(A':/3(r)  >  0)|A',]  x  s.(/3,^(r)) 

=  O[fu{0\X,)z'XiXlzl{Xll3{T)  >  0)]  x  s.(/?,/9(r)),  ^^^^ 

uniformly  in  i  [the  last  equality  again  follows  by  the  standard  calculations].  Therefore  by  assumptions  (iii) 
and  Lipshitz  condition  (iv) 

EE[V^V::{z)sdP,l3iT))\X,]  =  0[Es.{l3,l3(r))]  =  O{0  -  P{t))  +  0{  <5„).  (32) 

(30)  and  (31)  give  (29). 

Part  3.  Finally,  to  show  joint  convergence  of  Zn(r,)  =  y/n{(3{Tj)  —  i3{tj)),  i  <  J  for  either  step,  proceed  as 
before  by  defining  variable  Z^  —  (Zniri),i  <  J)  as  minimizing  the  objective  function:  671(2)  =  X]i<j  Qn{zi) 
over  z  =  (zi,  i  <  J),  where  Qn  are  objective  functions  defined  as  in  part  1  or  part  2.  Repeat  the  arguments  in 
parts  1  and  2,  noting  that  sum  of  s.e.  processes  is  s.e.   (see  e.g.  Knight  (1999)). 


Appendix  B.  Practical  Details  of  Estimation  and  Inference 

Inference:  Because  of  distributional  equivalence  with  the  Powell  CQR  estimator,  all  of  the  inference  proce- 
dures developed  by  Powell  apply  without  modifications.  The  bootstrap  inference  of  Bilias,  Chen,  and  Ying 
(2000)  can  be  extremely  useful  in  practice.  All  inference  procedures  are  as  for  the  standard  quantile  regression 
procedure  with  the  only  difference  being  that  the  selected  (rather  than  complete)  sample  is  used.  Therefore,  a 
user  of  the  standard  QR  software  need  not  make  any  modification  -  the  standard  errors,  confidence  intervals 
produced  by  the  last  step  QR  routine  are  all  valid. 

Model  p()  and  Trimming  Constant:  Parametric  model  p  in  step  1  should  be  reasonably  good.  Goodness  of 
fit  checks  are  supplied  in  all  the  standard  statistical  packages  and  are  easy  to  carry  out.  It  is  always  possible 
to  achieve  a  reasonably  good  fit  using  a  series  approximation  of  probability  model.  The  theory  requires  c  to  be 
not  very  small,  although  the  Monte-Csirlo  work  shows  that  c  =  0  is  also  quite  sensible.  In  practice,  one  should 
check  sensitivity  of  the  estimates  Po{t)  to  increasing  the  constant  c.  Also,  c  shouldn't  be  too  big,  for  we  may 
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select  very  small  Jo-  The  better  the  parametric  fit  is,  the  smaller  c  can  be  set,  improving  efficiency  of  the  initial 
/3o-  A  sensible  rule  for  choosing  c  is  to  compare  the  size  of  the  selected  sample  J(c)  =  {i  :  p(X'^)  >  1  —  r  +  c} 
for  c  =  0  and  other  values.  Choosing  c  =  q-th  quantile  of  all  p(A','7)  s.t.  >  1  —  r, appears  to  be  sound,  as 
it  gives  a  control  of  percentage  of  observations  from  J(0)  can  be  thrown  out:#J(c)/#J(0)  =  (1  —  g)%.  This 
rule,  with  q  =  10%,  was  employed  in  simulations.  We  also  set  5n  corresponding  to  q  =  5%. 

Appendix  C.  Monte-Carlo  Experiments 


Finally,  table  I  reports  the  result  of  a  small  monte  cEirlo  experiments.'"  The  model  we  consider  was  a 
standard  location-scale  model  with  an  error  term  hit  by  a  linear-quadratic  heteroscedcistic  scale,  for  Xi  = 
(l,A',  1  :  Y'  =  X',P  +  ei.  We  draw  Xi  6  R^  from  independent  standard  normal  distributions,  truncated 
by  {A',  :  ||A',||  <  2}.  The  error  term  has  the  multiplicative  heteroscedasticity  structure;  for  u,  ~  A^  (0,  25), 
e,  =  Ui  X  (l  +  0.5X]j  =  i  i^i  ~^-^'))'  "^^^  ^"^^^  parameter  vector  is  chosen  at  (1, 1,  .5,  — 1,  — .5,  .25),  and  the 
censoring  point  is  -0.75.  We  used  X  and  X^  in  the  parametric  propensity  score  regressions.  We  experimented 
with  different  probability  models  including  logit,  probit  and  linear  models.  However,  the  type  of  probability 
model  used  has  very  little  effect  on  the  performance  of  the  estimators.  Therefore,  in  table  I  we  only  report  the 
results  for  the  probit  selection  model.  Note  that  due  to  the  heteroscedasticity  error  structure,  even  the  probit 
model  is  not  consistent  with  the  true  propensity  score.  We  report  the  initial  step,  the  first  step  and  the  third 
step  estimators,  and  compare  them  to  the  Buchinsky  and  Hahn  (1998)  estimator  and  the  Powell  estimator. 

Notably,  the  results  from  the  monte  carlo  simulation  show  that  for  sample  size  100,  the  step-3  (1=1) 
estimator  outperforms  all  other  estimators  in  terms  of  root  mean  square  errors(RMSE).  Its  mean  absolute 
deviation(MAE),  both  mean  and  median  biases,  are  comparable  to  other  estimators.  Iterating  to  step  5(1=5) 
increases  both  root  mean  square  error  and  mean  absolute  deviation,  at  the  benefit  of  reducing  both  mean  and 
median  bicises.  Overall  the  step  5  estimator  still  compares  favorably  to  both  the  Buchinsky  and  Hahn  (1998) 
estimator  and  the  Powell  estimator.  The  initial  inefficient  estimator  (1=0)  does  fairly  poorly,  as  expected.  In 
a  large  sample,  n  =  400,  the  3  estimator  and  the  Buchinsky  and  Hahn  (1998)  estimator  perform  equally  well 
and  are  both  more  favorable  than  other  estimators  in  all  dimensions.  To  conclude,  the  3-step  estimator  does 
better  in  a  small  sample,  and  very  well  in  a  large  sample. 
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